On the Application of Locality Sensitive Hashing to Behavioral Web Analytics

نویسندگان

  • Shantanu Gore
  • Thomas Jefferson
  • Myriam Abramson
چکیده

In today’s constantly connected world, the dependence on web-based technologies is ubiquitous, creating opportunities for both malicious and benign activity. As a result, it is essential that we be able to identify users on the web. Although simple methods, such as tracking a user by userid or by IP address exist, these methods can easily be evaded if the user so desires, by creating multiple ids or operating from different IP addresses. However, due to habit, a user does not often change the way he or she browses the web, such as the time of day that she visits various genres of pages. In this project, we evaluate locality sensitive hashing (LSH) to uniquely identify and authenticate users based only on the day of the week, time of day, and genres of websites. In addition, we provide a novel extension of LSH, Mode Closest Hash (MCH).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Layered Locality Sensitive Hashing based Sequence Similarity Search Algorithm for Web Sessions

In this article we propose a Layered Locality Sensitive Hashing Algorithm to perform similarity search on the web log sequence data. Locality Sensitive Hashing has been found to be an efficient technique for the approximate nearest neighbor search over a large database, as it has sub-linear dependence on the data size even for high dimension. Mining the large web log data to provide customised ...

متن کامل

Streaming First Story Detection with application to Twitter

With the recent rise in popularity and size of social media, there is a growing need for systems that can extract useful information from this amount of data. We address the problem of detecting new events from a stream of Twitter posts. To make event detection feasible on web-scale corpora, we present an algorithm based on locality-sensitive hashing which is able overcome the limitations of tr...

متن کامل

Trading accuracy for faster entity linking

Named entity linking (NEL) can be applied to documents such as financial reports, web pages and news articles, but state of the art disambiguation techniques are currently too slow for web-scale applications because of a high complexity with respect to the number of candidates. In this paper, we accelerate NEL by taking two successful disambiguation features (popularity and context comparabilit...

متن کامل

Efficient Online Locality Sensitive Hashing via Reservoir Counting

We describe a novel mechanism called Reservoir Counting for application in online Locality Sensitive Hashing. This technique allows for significant savings in the streaming setting, allowing for maintaining a larger number of signatures, or an increased level of approximation accuracy at a similar memory footprint.

متن کامل

Scalable Techniques for Clustering the Web

Clustering is one of the most crucial techniques for dealing with the massive amount of information present on the web. Clustering can either be performed once offline, independent of search queries, or performed online on the results of search queries. Our offline approach aims to efficiently cluster similar pages on the web, using the technique of Locality-Sensitive Hashing (LSH), in which we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014